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Abstract 

We consider estimation procedures which are recursive in the sense 
that each successive estimator is obtained from the previous one by a 
simple adjustment. We study rate of convergence of recursive estima- 
tion procedures for the general statistical model. 
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1 Introduction 

Let Xi, . . . , Xn be random variables, with a joint distribution depending on a 
real unknown parameter 9. Then an M-estimator of 9 is defined as a solution 
of the estimating equation 



:i.l) ^^,(^;)=0, 



i=l 



where 'ipiiv) = il)i{Xl_j^; v) (i = 1, 2, . . . , n) are suitably chosen functions and 
^i-k = (-^i-fc; • • • ; ^i) IS the a vector of past and present observations at step 
(time) i. For instance, if Xj's are observations from a discrete time Markov 
process, then one can assume that k = 1. If observations are i.i.d., then we 
take = so that ipi{v) = ipi{Xi\v). In general, if no restrictions are made 
on the dependence structure of the process Xj, one may need to consider ifj- 
functions depending on the vector of all past and present observations of the 
process (that is, = i — 1). If the conditional probability density function (or 
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probability function) of the observation Xj, given Xi_k, . . . , is fi{x, 9) = 
fi{x,6\Xi_k, . . . , Xi_i), then one can obtain a MLE (maximum hkehhood 
estimator) on choosing ipii'^) = fii^i: / fii^i: '^)- Besides MLEs, the class of 
M-estimators includes estimators with special properties such as robustness. 
Under certain regularity and ergodicity conditions it can be proved that there 
exists a consistent sequence of solutions of fll.ip which has the property of 
local asymptotic linearity (See e.g., Serfling (1980), Huber (1981), Lehman 
(1983). A comprehensive bibliography can be found in Launer and Wilkinson 
(1979), Hampel at al (1986), Rieder (1994), and Jureckovd and Sen (1996).) 

If 'i/'-f unctions are nonlinear, it is rather difficult to work with the cor- 
responding estimating equations. In this paper we consider estimation pro- 
cedures which are recursive in the sense that each successive estimator is 
obtained from the previous one by a simple adjustment. In particular, we 
consider a class of estimators 

(1.2) en = 9n-l+T-\9n-l)M0n-l), U > 1, 

where ipn is a suitably chosen vector process, r„ is a (possibly random) 
normalizing matrix process and 60 G is some initial point. (See the intro- 
duction in Sharia (2006) for a detailed discussion and a heuristic justification 
of this estimation procedure.) 

In i.i.d. models, estimating procedures similar to (11.21) have been stud- 
ied by a number of authors using methods of stochastic approximation the- 
ory (see, e.g., Khas'minskii and Nevelson (1972), Fabian (1978), Ljung and 
Soderstrom (1987), Ljung, Pfiug and Walk (1992), and references therein). 
Some work has been done for non i.i.d. models as well. In particular, En- 
glund. Hoist, and Ruppert (1989) give an asymptotic representation results 
for certain type of Xn processes. In Sharia (1998) theoretical results on con- 
vergence, rate of convergence and the asymptotic representation are given 
under certain regularity and ergodicity assumptions on the model, in the 
one- dimensional case with ipni^, 6') = ^log/„(x, 9) (see also Campbell (1982), 
Sharia (1997), Lazrieva and Toronjadze (1987)). 

In Sharia (2006), imposing "global" restrictions on the processes and 
r, we study "global" convergence of the recursive estimators (11.21) . that is, 
convergence for an arbitrary starting point ^o- In the present paper, we 
present results on rate of the convergence and demonstrate the use of these 
results on some examples. 
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2 Notation and preliminaries 



Let Xt, t — 1,2, . . . , he observations taking values in a measurable space 
(X, i3(X)) equipped with a (T-finite measure fi. Suppose that the distribution 
of the process depends on an unknown parameter 9 E Q, where is an 
open subset of the m-dimensional Euchdean space M.^. Suppose also that for 
each t — 1,2, ... , there exists a regular conditional probability density of Xt 
given values of past observations of Xt-i, . . . ,X2,Xi, which will be denoted 

by 

ft{9, Xt I x{ ^) = ft{9, Xt I xt-i, ■■■,xi), 

where fi{6,xi \ x^) = fi{6,xi) is the probability density of the random 
variable Xi . Without loss of generality we assume that all random variables 
are defined on a probability space (i^,^) and denote by {P^, 9 E 0} the 
family of the corresponding distributions on (f2,jF). 

Let J-'t = . . . , Xt) be the cr-field generated by the random variables 

Xi, . . . ,Xt. By (M™", i3(M™)) we denote the m-dimensional Euclidean space 
with the Borel cr-algebra i3(M'"). Transposition of matrices and vectors is 
denoted by T. By {u,v) we denote the standard scalar product oiu,v e R"*, 
that is, {u, v) = u^v. 

Suppose that h is a real valued function defined on C M™. We de- 
note by h{9) the row-vector of partial derivatives of h{9) with respect to the 
components of 9, that is, 

If for each t = 1,2, . . . , the derivative ft{9, Xt \ x*^^) w.r.t. 9 exists, then 
we can define the function 

with the convention 0/0 = 0. 

The one step conditional Fisher information matrix for i = 1,2, ... is 
defined as 

it{9 I x\-') = j lt{9,z\ x{-')lj{9,z\ x\-')M9,z \ x\-')pi{dz). 

We shall use the notation 

M9) = M9,Xt I xt'), k{9) = k{9,Xt I X\-'), 
it{9) = it{9\X{-'). 
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Note that the process it{9) is "predictable" , that is, the random variable it{0), 
is J^t-i measurable for each t > 1. 

Note also that by definition, it{9) is a version of the conditional expecta- 
tion w.r.t. J^t-i, that is, 

tt{e)=Ee{lt{e)lJ{9)\J^t-i}. 

Everywhere in the present work conditional expectations are meant to be 
calculated as integrals w.r.t. the conditional probability densities. 
The conditional Fisher information at time t is 

t 

m = J2ts{e), t = i,2,.... 

s=l 

If the XtS are independent random variables, It{9) reduces to the standard 
Fisher information matrix. Sometimes lt{0) is referred as the incremental 
expected Fisher information. Detailed discussion of this concept and related 
work appears in Barndorff-Nielsen and Sorensen (1994), and Prakasa-Rao 
(1999) Ch.3. 

We say that ip = {ilJt{9,Xt,Xt-i, . . . ,Xi)}t>i is a sequence of estimating 
functions and write -0 e ^, if for each t >1, il!t{0,Xt,Xt-i, . . . ,Xi) : © x 
X* — > R"* is a Borel function. 

Note that {lt{0, Xt \ x\~^)}t>i e * and a ML recursive procedure is given 

by 

et = et-i + ii\et-i)it{et-i), t>i. 

Convention Everywhere in the present work 9 G M™ is an arbitrary but 
fixed value of the parameter. Convergence and all relations between random 
variables are meant with probability one w.r.t. the measure unless spec- 
ified otherwise. A sequence of random variables {^t)t>i has some property 
eventually if for every uo in a set Vt^ of probability 1, has this property 
for all t greater than some to('-^) < oo- 

3 Main results 

Suppose that e * and ^t{^), for each 9 E R"*, is a predictable m x m 
matrix process with det Tt{9) 0, t > 1. Consider the estimator 9t defined 

by 

(3.1) 9t = ^Vi + r-i(^;_i)Vt(^Vi), t > 1, 
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where ^ I^™" is arbitrary initial point. 

Let 9 G be an arbitrary but fixed value of the parameter and for any 
ueW define 

ht{e,u) = Ee{iPt{e + u) \Tt-i}- 

Lemma 3.1 Let {Ct{0)} he a symmetric predictable m x m matrix process 
such that Ct{6) is non-negative definite for t = 1,2, . . . . Denote At = 6t — 9, 
Vtiy) = {Ct{0)u,u) and /Wtiu) = Vt{u) — Vt-i{u). Suppose that 

oo 

(3.2) ^ (1 + l^,_i(A,_i))"' [}^t{0T < oo, P'-a.s., 

t=i 

where 

(3.3) ICm = Ayi(Ai_i) + 2 {cmAt_,,Vt\e + At^iMe, Ai_i)) 

+Ee [ [v-\e + Ai_i)^t(^ + ^t-i)f cmv-\e + At-i)i^t{e + a,_i) | j-^^i} . 

Then Vt{/S.t) converges (P^-a.s.) to a finite limit. 

Proof. As always (see the convention in Section 2), convergence and all 
relations between random variables are meant with probability one w.r.t. 
the measure unless specified otherwise. To simplify notation we drop the 
argument or the index 9 in some of the expressions below. Rewrite (13. ip in 
the form 

Ai = At-i + T-\e + At-i)^t{e + Ai_i). 

By the Taylor expansion, 

Vt{\) = Vi(Ai_i) + Vt{At_{)Ti\e + At-i)Md + ^t-i) 
+i [v-\e + At-i)i^t{e + At_,)fYt{Kt)v-\e + /\t-i)i^t{e + At-i). 

Since Vt{u) = 2u^Ct and Vtiu) = 2Ct we obtain 

Vt{At) = Vt{At-i) + 2 {CtAt-i, Ti\e + At-i)M0 + ^t-i)) 
+ [T;\9 + At.,)M9 + At-i)]'^ CtT;\9 + At_,)iJt{9 + A^.i). 

Since 

\4(Ai_i) = Vt-i{At-i) + AVtiAt-i), 

we have 

Ee {VtiAt) I J^t^i} = Vi_i(Ai_i) + ICt. 
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Then, using the obvious decomposition ICt = [^t\^ — [^t\ , the previous 
inequahty can be rewritten as 



Ee {Vt{At) I J^t-i} = Vt^^{At^,){l + B^) + 5^ - [/C 



t\ 1 



where = {1 + Vt-i{At-i))-' [ICt] + . Since, by (O, EZi^t < oo, the 
assertion of the lemma follows immediately on application of Lemma Al in 
Appendix A (with Xn = 14(A„), f3n-i = C.n~i = Bn and Cn-i = [/C„]"). 

Corollary 3.1 Let {at{6)} be a predictable non- decreasing scalar process 
such that at{9) ^ oo as t ^ oo. Denote Aat{6) = at{9) — at^i{9) and 
suppose that 



(Rl) 



hm = 0, P'-a.s- 



(R2) there exist a symmetric and non-negative definite matrix Ce and a 
predictable non-negative scalar process Vt such that 

2 {CeAt-u T;\e + At-i)bt{e, At-i)) +Vt< 
(3.4) -Ai(^^)(C,Ai_i,Ai_0, 

eventually, where {A((0)} is a predictable scalar process, satisfying 
'Aat{e) 



(3.5) 



s=l 



MO) 



< oo, P -a.s. 



at{e) 

(R3) for each < £ < 1 and the process Vt defined in (R2), 

oo 

5^af(^) [Eg {\\V-\e + Ai_i)V^t(^ + A<_i)||2 I -VtY < oo, P'-a.s. 

s=l 

Then at{ef\et - eYCeiOt - ^) ^ (P^-a.s.) for any 6 g]0, 1/2[. 

Proof. As always (see the convention in Section 2), convergence and all 
relations between random variables are meant with probability one w.r.t. 
the measure P^ unless specified otherwise. Let us check the conditions of 
Lemma 3.1 for Ct{9) = Cd{at{9)Y^ , 5 g]0, l/2[. To simplify notation we drop 
the fixed argument or the index 6 in some of the expressions below. Denote 

r, = (Aaf - af A,)/a?li 
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and 

Pt = af {£t-Vt 

where 



£t = Ee[ [v~\e + ^t-i)Me + ^t-i)fc [v-\e + ^t-i)Mo + a^-i)] | j^t-i] 

By (R2), for Kt defined in (13.31) we have 
Kt = Aaf (CAi_i, Ai_i) + 2af (CA^^i, r-i(0 + ^t^^MO, A^^i)) + 

< (Aaf - af Ai) (CA^^i, Ai_i) + ^ 
<ri (a21,CAi_i,Ai_i)+n. 

Since C is non-negative definite, 

(1 + Vt-,{^t-i)V [^t]" = (1 + (a?liCA,_i, A,_i))"' [^*]^ < K]+ + [VtV- 

By (R3), Xlt^ii'^t]"'' < ^ which imphes that (13.21) is equivalent to Yl^i Vti^ < 
oo. Since Aa^*^ = a^^ — af^i, we can rewrite as 

Ti = {ata'i~\Y^ (1 - At) - 1. 
Also, since (1 + x)^^ = 1 + 26x + O(x^), we have 

where, by (Rl), 6l = O (Aat/at-i) ^ as t ^ oo. Denote 

r]t = Aat/at - A*. 
Then simple calculations show that 



n <(a,a-\)"(l + *+-^) 



(it-1 Cbt-l 

(1 - 25)^^ - ^5? 

at at-i at 

f n on , x(2)A , x(3) 



(l - 26) + 6?^) + 6i 
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where 

\at-i J at at 

at~i 

From (Rl) and (R2), ^ and^^^^^ |5f ^| < oo. Then, since l-2d > 0, we 
obtain that [r^j^ < \6t^^\. It therefore follows that the conditions of Lemma 
3.1 are satisfied implying that a^'^H^j— converges to a finite limit. Finally, 
since this holds for an arbitrary 6 g]0, 1/2[ and at oo, the result follows. 

Remark 3.1 Note the that the first term in the left hand side of (13.41) is 
usually negative and assuming that Vt = the positive parts in (13.51) are 
usually zero (or quite small) in many examples. On the other hand, the 
choice Vt = means that (R3) becomes more restrictive imposing stronger 
probabilistic restrictions on the model. The choice = is natural in 
the iid case since all the required probabilistic conditions are in this case 
automatically satisfied, (see also Remark 3.2). Now, if the first term in the 
left hand side of (13.41) is negative with a "high enough" absolute value, then 
it may be possible to introduce a non-zero Vt without jeopardising (13.51) . One 
possibility might be Vt = \\T'l~'^{9+At^i)bt{9, A(_i)|p. Also, in this case, since 
bt{9, u) = Eg{tpt{0 + u) I J^t-i} and V^^{6 + u) are predictable processes, the 
condition in (R3) can be rewritten as 

oo 

<{0)Ee{\\T-\e + At-i) {MO + Ai_i) - ht{e, At-i)} ir I J't-i] < oo. 

s=l 

Remark 3.2 Consider the i.i.d. case with 

ftie,z I x*f 1) = fie,z), Me) = ^(^,^)|.=x„ 

where J ip{d, z)f{6, z)fi{dz) = and Tt{6) = t'~f{d) for some invertible non- 
random matrix '~f{6). Then 

bt{e, u) = b{e, u)= f ^{6 + u, z)f{e, z)fi{ dz), 



implying that bt{0, 0) = 0. Denote At = 9t — 9 and rewrite (13.11) in the form 
(3.6) At = Ai_i + 1 {^-\e + A,_i)6(^, At-i) + 4) , 
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where 

Equation f l3.6p defines a Robbins- Monro stochastic approximation procedure 
that converges to the solution of the equation 

R\u) ■.= -f'\e + u)b{e,u) =0, 

when the values of the function R^{u) can only be observed with zero ex- 
pectation errors e^. Note that in general, recursion fl3.ip cannot be con- 
sidered in the framework of classical stochastic approximation theory (see 
Lazrieva, Sharia, and Toronjadze (1997, 2003) for the generalized Robbins- 
Monro stochastic approximations procedures). For the i.i.d. case, conditions 
of Corollary 3.1 can be written as (Bl) and (B2) in Corollary 4.1 (see also 
Remark 4.1), which are standard assumptions for stochastic approximation 
procedures of type 03.61) (see, e.g., Robbins and Monro (1951), Gladyshev 
(1965), Khas'minskii and Nevelson (1972), Ljung and Soderstrom (1987), 
Ljung, Pflug and Walk (1992)). 

4 SPECIAL MODELS AND EXAMPLES 

1. The i.i.d. scheme. Consider the classical scheme of i.i.d. observations 
Xi, X2, . . . , with a common probability density/mass function f{6, x), 9 E 
M™. Suppose that ^^{O, z) is an estimating function with 

Let us define the recursive estimator 9t by 

(4.1) Ot = Ot-i + ^^-\et-i)^{et-i,Xt), t > 1, 

where ^{6) is a non-random matrix such that '~f^^{6) exists for any 6 G M"^ 
and ^0 £ I^*" is any initial value. 

Corollary 4.1 Suppose that 9^9 (P^-a.s.) and 

(Bl) there exists a symmetric and non-negative definite matrix Ce such that 
{Ceu,^-\9 + u)E'i;{9 + u,X,)) < -^{Ce%u), 
for small u 's; 
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(B2) Eg\\^~^{e + u)ij{e + u)f = 0(1) asu-^O. 

Then t^Ot - efCeiOt -6)^0 (P'^-a.s.) for any 6 e]0, l/2[. 

Proof. The result follows immediately if we take at{6) = t, Vt = and 
Xt{e) = l/tm Corollary 3.1. ^ 

Remark 4.1 As it was mentioned in Remark 3.2, for the i.i.d. case the 
recursive procedures can be studied in the framework of stochastic approx- 
imation theory. For stochastic approximation procedures of this type, con- 
ditions which guarantee a good rate of convergence are expressed in terms 
of stability of matrices. Recall that a matrix A is called stable if the real 
parts of its eigenvalues are negative. A standard requirement in stochastic 
approximation theory is the existence of the representation (see Remark 3.1 
for the notation) 

(4.2) R%u) = B% + o{\\u\\) as u ^ Q, 

where the matrix = B^ + ^lis stable. It is easy to see that this assumption 
implies (Bl). Indeed, it follows from the stability of 5*^ that the maximum 
of the real parts of the eigenvalues of is less than —1/2. This implies 
(see, e.g., Khas'minskii and Nevelson (1972), Ch.6, §3, Corollary 3.1), that 
there exists a symmetric and positive definite matrix Cq such that 

{Cqu, Beu) < {Ceu, u) , 
which, together with (14. 2p . implies (Bl). 



As a particular example, consider 



7r(l + (x-^)2)' 

the probability density function of the Cauchy distribution with mean 6. 
Simple calculations show that 



Now, using tables of standard integrals, it is easy to check that 



— log/(^,x)/(^,a;) dx 



2' 
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So, a ML recursive procedure is 

0, = <>._,- 1 ^(^'-^\-') , t>i. 

1 1 + {X, - e,.,r' 

Using tables of standard integrals and simple algebra, 



TT J 1 + {x — u)"^ 1 + x'^ 4 + 14^' 

and 

'i(9 + u X)] ' f(9 x)dx^^ [ ( ^ '^dx^ ^(^ + 3^^) 

Now, it is easy to check that conditions (I) and (II) of Corollary 4.1 in Sharia 
(2006) (or in Sharia (1998)) are satisfied, implying that 9t ^ 9 (P^- 
a.s.). Let us check the conditions of Corollary 4.1. It follows from the above 
calculations that (B2) holds. Then, for arbitrary 0<£<l/2we have 



i-\9)b{9,u) 4 u 



2 

< -1 +£ 



u 4 + m2 4 + ^2 

for small w's, which yields that (Bl) is satisfied with Cg — 1. Therefore, 
t^{9t -9) ^0 (P^-a.s.) for any < 5 < 1/2. 

2 Exponential family of Markov processes Consider a conditional 
exponential family of Markov processes in the sense of Feigin (1981) (see also 
Barndorf-Nielson (1988)). This is a time homogeneous Markov chain with 
the one-step transition density 

f{y; 9, x) = h{x, y) exp {9'^m{y, x) - (3{9; x)) , 

where m(y, x) is a m-dimensional vector and f3{9; x) is one dimensional. Then 
in our notation ft{9) = f{Xt; 9, Xt-i) and 

h{0) = ^log/,(^) =m(X„X,_i) -/?^(^;X,_i). 

It follows from standard exponential family theory (see, e.g., Feigin (1981)) 
that lt{9) is a martingale-difference and the conditional Fisher information is 

t 

h{9)^Y.'^{9;X,_,). 

s=l 
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So, a maximum likelihood type recursive procedure can be defined as 

et = et-i+ [Ylhdt-i\Xs-i)^ (m(Xi,Xi_i)-/3^(^i_i;Xi_i)), t>l. 

Let us find the functions appearing in the conditions of our theorems for the 
case = It and Vt = It- Since Eg {lt{0) \ J-'t-i} = we have 

Eg{m{Xt,Xt-,)\J^t-i} = P'^{0;Xt^,) 

and also, 

mXt-i) = m = Ee {ltie)lj{9) I J^t-i} 
= Ee {m{Xt,Xt-i)m^{Xt,Xt-,) \ J^t-i} - $^{0; Xt.,)${e; Xt^i), 

which implies that 
(4.3) 

Ee {m{Xt,Xt.,)m^{Xt,Xt-i) \ J't-i} = mXt^,) + $^{e-,Xt-i)mXt^i). 
Now, it is a simple matter to check that 

(4.4) bt{e, u) = Eg {lt{e + u) I J^t-i} = P^{0; Xt^,) - P^{e + u; Xt.i). 

Using (14. 3 p (since trace(ff"^) =v'^v and trace(yl + 5) =trace A+trace-B), 

Ee { \\lt{e + u)\\^\ J^t-i} = tiacem Xt-i) + \\$^{e; Xt-i) - $^{6 + u; Xt^i) f 
(4.5) =tTacemXt-i) + \\btie,u)f. 

Using these expressions one can check conditions of the relevant theorems 
for different choices of functions m and j3. 

Now suppose that 6 is one dimensional and consider the class of condi- 
tionally additive exponential families, that is, 

f{y; e, x) = h{x, y) exp {6m{y, x) - (3{e; x)) , 

with 

(4.6) (3{e;x) = -f{e)h{x) 

where h{-) > and 7(-) > (see Feigin (1981)). Then, 

t 

It{e)=^{e)Ht where Ht = Y,KXs-i). 

s=l 
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Assuming that ^{6) ^ 0, the hkehhood recursive procedure is 
(4.7) et = Ot.i + —J— (m{Xt, X,_i) - 7(^i_i)/i(Xi_i) 

The following result gives sufficient conditions for the convergence of 

Proposition 4.1 Suppose that Ht ^ oo (P^-a.s.) and either'^ is a linear 
function, or the following conditions are satisfied: 



(Ml) 



Ht 

(M2) for any finite a and h, 



P'-a.s- 



< inf 7(m) < sup 7(m) < oo; 

uG[a,b] u€[a,b] 



(M3) there exists a constant B such that 

1 + 7^(m 



for each m G R. 



Then 9t defined by (14 .yp is strongly consistent (i.e., Ot ^ P^-a.s.) for any 
initial value 9q . 

Proof. See Appendix B. 

In the next statement we assume that the recursive procedure converges 
and study the rate of convergence. 

Corollary 4.2 Suppose that 6t defined by ( 14. 7p is strongly consistent (i.e., 
9t ^ 9 P^-a.s.). Suppose also that 

(1) Ht oo, P^-a.s.; 
(2) 

^^0, P^-a.s.-, 
(3) 7(-) is a continuous positive function. 
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Then Hf{dt-d)^Q (P^-a.s.) for any 6 e]0,l/2[. 

Proof. As always (see the convention in Section 2), convergence and all 
relations between random variables are meant with probability one w.r.t. 
the measure unless specified otherwise. By (14.41) . 

(4.8) bt{9, u) = h{Xt-i) im - j{9 + u)) . 

Let us check that the conditions of Corollary 3.1 are satisfied with ipti^) = 

hie) = m{Xt,Xt.i) - mnxt^i), ^tie) = hie) = Haie), «tW = Ht, 

Ce = 1 and Vt = Hf^r^O + \-i)bKO, Since AH^ = h{Xt_^), (Rl) 

is obviously translated into (2). Since '^{6) — 7(6* + u) = —7(6* + u)u where 
|m| < the left hand side of (13.41) is 

M^,_07(g + A,_i) h\Xt-^) / 7(g + A,_i) y 

Since 7(-) is continuous and At-i = 6t — 6 ^ 0, for any small e > (which 
may depend on ^), l — e< 7(6* + At-i)/^{9 + At-i) < 1 + e for large t's. So, 
M holds with 

A,(9) = 2(l-.-)M*zl)_(i + ,-,^M»zl). 



To check (13.51) . consider 



(4.9) - A,(«) . (_i + 2, + (1 + 



Ht Ht \ Ht 

Now, since e is arbitrary, we can assume that — 1 + 2e < 0. Also, it follows 
from (2) that h{Xt-i)/Ht 0. Therefore, (14. 9 p is negative for large t's, 
implying that (13. 5p holds true. 

To check (R3) note that by fOj) . 

(4.10) Eg{l'}{9 + u)\J't-i}=mh{Xt-,) + b^tiO,u) 

and so. 

Now, (R3) follows from (3) and Proposition A2 in Appendix A. <) 
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A particular example of conditional additive exponential family is the 
Gaussian autoregressive model defined by 

Xt = 9Xt-i + Zf, t = 1,2, . . . , 

where ^ e M, Xq = and Z^'s arc independent random variables with the 
standard normal distribution. In this model m{y,x) = xy and j3{6,x) — 
^x'^9^ so that we can assume that 7(6') = 9'^/2 and h{x) = x^. Then 

t 

hie) = XtXt., - xl,e, h = m = J2xl,. 

s=l 

Therefore, 

(4.11) et = + J (^X,X,_^ - Xl^Ot^^) 

It = It-i + Xl,. 

Note that the rate of the conditional Fisher information It varies for the 
different values of 9. Suppose 

r t{l - ^2)-i for 1^1 < 1 

(4.12) Kti9) = \ for 1^1 = 1 

\ ^2*(^2 _ i)-2 foj. > 1 

For 1^1 < 1, It/ntid) 1 in probability as t 00, whereas It/ Kt{6) 
W ~ X^(l) almost surely in the case 16*1 > 1 (non-ergodic case). In the case 
1^1 = 1, the ratio It/Kt{9) converges in distribution, but not in probability 
(for details, see White (1958) and Anderson (1959)). It is also well known 
that /i — 00 almost surely for any G M (see, e.g, Shiryayev (1984), Ch.Vll, 
5.5). Also, since 'y{9) is linear and Ht = It, the conditions of Proposition 4.1 
are trivially satisfied. Therefore, for any ^ G M, the recursive estimator 9t is 
strongly consistent for any choice of the initial ^o- 

To establish the rate of convergence we assume that the process is (strongly) 
stationary and ergodic. So, |^| < 1 and and it follows from the ergodic the- 
orem for stationary processes that the limit 

(4.13) lim ht 

exist F^-a.s. and is finite (it can be proved this holds without assumption of 
strong stationarity.) Now, taking Ht = It, we obtain that 

A/t It t , 
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since dt = {{t - I) / It-i){It/t) 1. This implies that (2) of Corollary 4.2 
holds. (Note that for the non-ergodic case |^| > 1, we do not expect (2) to 
hold since in this case AKt/nt-i — 9^ — 1 -/^ 0.) 

So, the conditions of Corollary 4.2 are satisfied implying that t^{9t — 9) — > 
for any < 5 < 1/2. 



APPENDIX A 

Lemma Al Let jFo,jFi, ... be a non- decreasing sequence of a -algebras and 
^n, Pn, Cn, Cn G ^n, n > 0, are nonnegative r.v. 's such that 

eventually. Then 

oo oo oo 

{J2 <oo}n{J2 A-1 < 00} c {X ^} n {J2 0-1 < 00} {P-a.s.), 

1=1 i=l 1=1 

where {X ^} denotes the set where lim„_>oo-^n exists and is finite. 

Remark Proof can be found in Robbins and Sicgmund (1971). Note also 
that this lemma is a special case of the theorem on the convergence sets non- 
negative semimartingales (see, e.g., Lazrieva, Sharia, and Toronjadze (1997)). 

Proposition A2 If dn is a nondecreasing sequence of positive numbers such 
that dn +00, then 



^dn/dn = +00 



n=l 



and 



^djdl+' < +00 

n=l 

for any £ > 0. 

Proof The first claim is easily obtained by contradiction from the Kronecker 
lemma (see, e.g.. Lemma 2, §3, Ch. IV in Shiryayev (1984)). The second one 
is proved by the following argument 

N . , N ^^ A 7 N 



d^ 



£ \dl dlfj ^ edf^ 
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APPENDIX B 



Theorem Bl (Sharia (2007), Theorem 3.2) Suppose that for 9 e R"" there 
exists a real valued nonnegative function Vg{u) : M™" — > M having continuous 
and bounded partial second derivatives and 

(Gl) Vg{0) = 0, and for each e G (0, 1), 

inf Ve{u) > 0; 

||«|| >e 

(G2) there exists a set A E with P^{A) > such that for each e G (0, 1), 

oo 

inf Wtiu)]' = oo 

j^e<Ve{u)<l/e' 

on A, where 

Mt{u) = Ve{u)Ti\e + u)Ee{Md + u)\J't-i} 

+ \ sup II Ve{v)\\Ee{\\V-\e + u)iJt{e + n)f | :Ft-i] , 

(G3) for/\t = et-e, 

oo 

5^(1 + Ve{/\t-i)V [M(Ai_i)]+ < oo, P'-a.s.. 

t=i 

Then 9t ^ 9 (P^-a.s.) for any initial value 9q, where 9t is defined by 

Proof of Proposition 4.2 As always (see the convention in Section 2), con- 
vergence and all relations between random variables are meant with prob- 
ability one w.r.t. the measure unless specified otherwise. Let us check 
that the conditions of Theorem Bl above are satisfied with ipti^^) = = 
m(Xi,Xi_i) - 7(e)/i(Xi„i), Vt{9) = h{9) = Ha{9), and Vt = u\ Using 
m and flCTD . we have 

M(.) - ^-H^^m u) + j^j^^fe {il[e + .) I ^._,} 

h{Xt^,)^{9)-^{9 + u) , h{Xt^,)^{9)-^{9 + u) \ , /i(X,_i) 7(^) 
— -u 2 H — — r + 



Ht i{9 + u) \ Ht u^{9 + u) J i\9 + u) 

(4.14) =:Afu{u)+Af2t{u), 
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with the convention that 0/0 = 0. Let us show that for large t's, 



h(Xt.i) Me) - Me + u) 

If 7 is hnear, the above inequahty trivially holds since h{Xt-i) / Ht = AHt/Ht < 
1. For a non-linear case we have (assuming that u ^ 0), 

(4.16) |(7(^) - j{e + u))/u^{e + u)\ = j{e + u)/j{e + u) 

where |-it| < \u\. Suppose now that \u\ < M where < M < oo. Then 
it follows from (M2) that the left hand side of (14.161) is bounded by some 
positive constant. Also, using the obvious inequality (a — 6)^ < 2a^ + 25^ 
and (M3), we obtain that (7(^) - 7(^ + u)f/^'^{e + u) < B{1 + u"^) for any 
u (where B may depend on 6*). So, the left hand side of (I4.16P is less than 



or equal to y B{1 + v?)/v? = y B{l/v? + 1) which is bounded by a positive 
constant if |m| > M. So, the left hand side of (I4.16P is bounded by a constant 
(which may depend on 6*) for any u. So, because of (Ml) it follows that f l4.15p 
holds for large t's. This implies that Mit{u) < for large t's (recall that 7(-) 
is positive). So, using (M3) we obtain that for large t's. 



for some constant Bi which may depend on e. Now, since X^^^ h{Xt^i)/ < 
oo (see Proposition A2 in Appendix A), condition (G3) of Theorem Bl is 
satisfied. To check condition (G2), note that (7(6*) — 7(6* + «))« < 0, use the 
obvious inequality [x]~ > —x, and (14.150 to obtain that for large fs 

Ht j{e + u) 

_ /i(Xi_i)7(0+M)^2 KXt-i) m 



Ht ^{e + u) Hf ^^{e + u) 

where ImI < \u\. Then, it follows from (M2) that supi,^\^\^i/^^{e)/^'^{e + u) < 
R and inf£<|„|<i/e 7(6* + u)v? /^{e + m) > r > (where the positive constants 
R and r may depend on e). Note also that these inequalities trivially hold 
for the linear case. Therefore, using once more Proposition A2 in Appendix 
A we obtain that Yl^i infe<|u|<i/£ [A/t('u)] = 00 which completes the proof. 
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